Reconstructing a Random Potential from its Random Walks 
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The problem of how many trajectories of a random walker in a potential are needed to reconstruct 
the values of this potential is studied. We show that this problem can be solved by calculating the 
probability of survival of an abstract random walker in a partially absorbing potential. The approach 
is illustrated on the discrete Sinai (random force) model with a drift. We determine the parameter 
(temperature, duration of each trajectory, ...) values making reconstruction as fast as possible. 



Introduction. Random walks (RW) in random media 
have been intensively studied in the past decades as a 
paradigm for out-of-equilibrium dynamics, and have led 
to the discovery and understanding of important dynami- 
cal effects as anomalous diffusion, ageing ... [10. Briefly 
speaking the issue is to determine the statistical prop- 
erties of the walker from the ones of the energy poten- 
tial. Much less attention has been devoted to the in- 
verse problem: given one (or more) observed RW(s) can 
we guess the potential values? This question naturally 
arises in biophysics where the use of AFM, optical and 
magnetic tweezers make possible the mechanical separa- 
tion of single protein-protein complexes [31, or the un- 
folding and refolding of single biomoleculesjj, [5, 6]. The 
observed dynamics the rupture of chemical bonds, of fold- 
ing/unfolding of nucleic acids, or proteins can be mod- 
eled as a RW motion affected by thermal noise, mov- 
ing in a quenched potential determined by the composi- 
tion of the chemical bonds, or the sequence of amino- or 
nucleic-acids. Reconstructing the free energy landscape 
of those processes is the object of current and intense 
efforts 0,11,0, iij. 

In this letter we show how the inverse RW problem can 
be practically solved within the Bayesian inference frame- 
work and address the crucial question of the accuracy of 
reconstruction. In practice information can be accumu- 
lated either by increasing the duration of one RW, or 
observing more than one RW, or combining the two. We 
discuss the optimal procedure minimizing the total num- 
ber of data to be acquired, and show how this minimal 
amount of data can be calculated from the probability 
of survival of an abstract walker in a partially absorb- 
ing potential. The approach is illustrated in detail on 
the celebrated discrete random force (RF) model (Sinai 
model with non zero drift) [l|, |2|. 

Inference is a key issue in information theory and 
statistics [T3| , with applications in biology , social sci- 
ence [Hj], finance, ... A central question is the so-called 
hypothesis testing problem: which one of two candidate 
distributions is likely to have generated a set of measured 
data? This question was solved in the case of indepen- 
dent variables by Chernoff [HI], and is the core issue of 
the asymptotic theory of inference . Chernoff showed 
that the probability of guessing the wrong distribution 
decreases exponentially with the size of the data set [l3[ . 
Large deviations techniques can be used to treat the case 



of variables extracted from one recurrent realization of 
a finite Markov chain [3, EH ; the present work can be 
seen as an extension to many transient realizations of an 
'infinite' chain. 

Random Force model. For an illustration of the prob- 
lem consider the discrete, one dimensional RF model 
defined on the set of sites x = 0,1,2, ... ,N 1]. We 
start by choosing randomly a set of dimensionless forces 
f x = ±1 on each link (x,x + 1) with a priori proba- 
bility P a = l\ x w here -I < b < I is called tilt. 
This defines the values of the potential V on each site, 
V x = -J2 y<x fy ( b y definition V = Q). An example of 
potential for b = 0.4 is shown on Fig. Q] 

After the quenched potential has been drawn a ran- 
dom walker starts in x — at time t = 0. The 
walker then jumps from one site x to one of its neigh- 
bors x' = x ± 1 with rate (probability per unit of time) 
rv(x — * x') = r x e^ Vm ~ Vx ''' ^ 2T ' to satisfy detailed bal- 
ance at temperature T; the attempt rate ro will be set to 
unity in the following. Reflecting boundary conditions 
are imposed by setting Vn+i = V-\ = +oo. We reg- 
ister the sequence of of positions up to some time tf. 
X = {x(t),0 < t < t f }. Figure Q] shows five RWs X p , 
p = 1, . . . , 5 , each starting in the origin x(0) = and of 
equal duration tf for a temperature T = 1. The value of 
the temperature strongly affects the dynamics 0] , and its 
relevance for the inverse problem will be discussed later. 

Our objective is to reconstruct the potential over a re- 
gion of the lattice e.g. the value of the forces on some 
specific links from the observation of RWs. Within Bayes 
inference framework this can be done by maximizing the 
joint probability of the potential V and of the observed 
RWs Xi,...,X R over V pjj. P is the product of the 
a priori probability of the potential, Pq, times the likeli- 
hood of the RWs given the potential, L. Since the RW 
is Markovian L depends only on the sets of total times 
t x spent on every site x, and of the numbers of jumps 
u(x — > x') from x to x' over the set of RWs: 



(1) 



where the product runs over all sites x and their neigh- 
bors x' — x ± 1. Expressing the rates in terms of the 
forces and maximizing the joint probability P we obtain 
the most likely values for the forces: f x = sign(/i x + a) 
where a = T ln[(l + &)/(! — b)] is a global 'field' coming 
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FIG. 1: Left, top: Example of potential V obtained in the 
RF model with tilt b = 0.4 (size N = 1000, sites x > 200 
not shown here). Right: examples of RWs, numbered from 1 
to 5, in this potential at temperature T = 1; plateaus are in 
correspondence with the local minima of V . Here a ~ 0.85 
(creep phase). Left, bottom: Predictions from the first R 
RWs in the right panel and ©; impulses locate incorrectly 
predicted forces f x for x < 200. The number of erroneous 
forces decreases from 26 (for R = 1) to (R = 5). Note the 
errors on sites xq ~ 100 appearing when the fourth RW is 
taken into account; indeed this atypical RW marks no pause 
in the local minimum in xq. 



from the a priori distribution Pq and h x a local contri- 
bution due to the likelihood L, 



([T]) where ■ denotes the scalar product. Maximizing the 
joint probability P = Po x L over the potential becomes 
equivalent, in the large R limit, to finding v with the 
largest scalar product with the signal s |2fJ • It is natural 
to partition the space of signals into 'Voronoi cells': C v 
is the set of s having a larger scalar product with v than 
with any other potential v'. Bayes rule tells us that the 
most likely potential given an observed signal s is the one 
attached to the cell in which s lies. 

Consider now RWs taking place in a given potential 
V. From the law of large number the signal s is equal, in 
the infinite R limit, to s* = {t* , u*(x — > x 1 ) = t* r-y(x — > 
x')} where t* is the average sojourn time on site x over 
RWs of duration tf. As s* E C v [21| reconstruction be- 
comes flawless in the limit of an infinite number of data 
as expected. For large albeit finite R, s typically deviates 
from s* by 0{R~^)\ finite deviations have exponentially 



small in R probabilities, e 



controlled by a rate 



function c^v( s ) [14]. The probability to predict an er- 
roneous potential is the probability that the stochastic 
signal s does not belongs to cell C v . This probability 
of error thus decays exponentially with R over a typical 
number of RWs 



Rc(V) 



| min ujv(s) 



(3) 
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h x = 2Tsinh (— ) (t x+1 -t x )- 



■u{x 



x+1)— u(x+l — » x) 
(2) 

Figure Q] (left, bottom) shows predictions made from R = 
1 to R = 5 RWs for the first 200 sites. The duration tf 
of the RW is chosen to be much larger than the mean 
first passage time in x = 200, and much smaller than 



where the minimum is taken over signals outside the 
'true' cell. It depends on the temperature, the duration 
ofthcRW, ... 

As the RWs are independently drawn o>v is a convex 
function of s 2M ■ The minimum in (J3]) is thus reached 
on the boundary between the true cell and another, bad 
cell, say, Cv- The attached potential, V, is the most 
'dangerous' one from the inference point of view. RWs 
generated from V and V are hardly told from each other 
unless more than R C (V) of them are observed. 

Assume V is known. Then the boundary between C v 
and Cv is the set of signals s _L v — v. We deduce 



the equilibration time t. 



eg 



a bN/T 



In this range the # C (V) = [ max min (wy(s) + fis ■ (v - v)) ] 



(4) 



quality of prediction is essentially independent of tf as 
will be discussed in detail below. As expected the number 
of erroneous forces decreases with increasing R though 
atypical events may produce flaws in the prediction. The 
analysis of these atypical RWs, and how they lead to 
errors is the keystone of what follows. 

Number of RWs necessary for a good reconstruction. 
Expression ^ for the likelihood of the RWs is true for 
any potential V and can be geometrically interpreted as 
follows. Given a set of RWs we extract a signal vector S 
whose components are: the times t x spent on site x, the 
numbers u(x — > x') of transitions from site x to site x' . 
When R is large we expect S to be extensive with R and 
define the intensive signal s = S/R. Similarly, to each 
potential V we associate a vector v with components: 
minus the outgoing rate i.e. —J2 X '(=i x ) r v{x —> %') f° r 
each site x, the logarithm of the rate ry(s — ► x') for 
each pair of neighbors. Then L = exp(i? s • v) from 



where the Lagrange multiplier fi £ [0; 1] ensures that s 
is confined to the boundary. The Legendre transform of 
c^v appearing in ([4J is intimately related to the evolution 
operator of an abstract random walk process, denoted by 
RW(/i) to distinguish from the original RW [16]. This 
RW(/x)-er moves with the rates r^ 1 _ fi - ) - v+fl -^(x — > x') and 
may die on every site x with positive rate 



c') +^ir v (i: -> x') 



r (l- t i)V+ti'v( x ~* x ')] 



(5) 



Consider now the probability 7r(/z) that RW(^)-er, 
initially at the origin, has survived up to time tf 
(the duration of the original RW). Then i? c (V) = 

min l/|ln7r(/i)|. 

/*e[o ; i] 



Optimal Working Point for the RF model. We apply 
the above theory to the discrete RF model, and want to 
predict the value of the force f y on the link (y, y + 1) for 
some specific y. The dangerous potential is V obtained 
from V upon reversal of the force f y — » —f y . We aim at 
calculating the probability 7r(//) of survival of RW(/i)-er 
moving with rate r(x — » x') = ry(x — > a;') and dying 
on site x with rate d(x) = except: r(y — > y + 1) = 
l/r(y + l - y) = eCl-W./Car), %) = D(f y ),d(y + 
1) = £>(-/») where £>(/) = (1 - M )e^ (2T > + ,«r'/< 2T > - 
e (i-2w//(2T) f rom (j5j)^ From the previous section the 
number of RWs required for a reliable prediction of f y is 
fl c (y;V)=mi % l/|ln7r(M)|. 

Let 7r x (/x, i) be the probability that RW(/x), initially on 
site x, is still alive at time t. The time-evolution of tt x is 
described by 



(6) 



with initial condition 7r y (/x,0) = 1 (by convention 7r_i = 
TTiV+i = 0). After Laplace transform over time, eqns 
([6]) are turned into recurrence equations for the ratios 
tt x /tt x+ i and solved with great numerical accuracy. We 
obtain this way the probability of survival, 7r(/z) = 
7To(/x,t/), and optimize over /i. Though R c depends on 
the potential V its general behavior for tilt b > as a 
function of the duration tj is sketched in Fig. [5] Three 
regimes are observed: 

• for tf < ^ (mean first passage time in y) RW(/i) 
has a low probability to visit y and is almost surely alive, 
hence R c is very large; 

• for T y -C tf <C t eq RW(^t) has visited the region 
surrounding y and escaped from this region (transient 
regime), hence its probability of survival remains con- 
stant, and so does R c ; 

• for tf ^> t eq RW(/i) visits again and again the region 
surrounding y, hence the probability of survival decreases 
exponentially with the duration: R c cx 

The total time R c xtf for a good reconstruction is min- 
imal when we choose tf > r y . This marginally transient 
regime corresponds to the plateau of Fig. [2j RWs are 
long enough to visit site y but short enough not to wan- 
der much away from y. To calculate the corresponding 
value of R c we take the limits, in order, N — ► oo, tf — > oo, 
and look for the stationary solution of ([6]) with boundary 
condition tTx^oq = 1. The result for the probability of 
survival is 

_ ii 

e t 

7T(yu) = —j- -y- ^jr , 

l-^ + fie t + n(l — ^) t* +1 (e4T — e 
where the mean sojourn time on site y + 1 in V is |2| 



i = exp 



z>0 



1 ( Vy+z+2 + Vy+z+1 



T 



V, 



y+i 



(8) 



Distribution of R c over potentials. The number 
R c (y\ V) of RWs necessary to predict the value of f v de- 




FIG. 2: Sketch of the number i? c (y;V) of RWs necessary 
for a good inference of the force f y as a function of the RW 
duration tf. t v is the typical first-passage time in y from 
the origin, t eq the equilibration time (comparable to the first- 
passage time from the extremity N when y N). Inset: rate 
of reconstruction (J2J) as a function of temperature at fixed tilt. 



pends on the potential V through the sojourn time t* +1 
(|8]l. By randomly drawing potentials (or varying site y) 
we obtain the distribution of R c shown in Fig. [3] Main 
features are: 

• Small R c correspond to sites where the RW spends 
long time f* (traps) [23]: R c ~ jt^~^ ~ urV from ©• 
The power law tail of the distribution of sojourn times, 
P(t*) ~ (t*)~( Q+1 ) gives rise to an essential singular- 
ity at the origin in the cumulative distribution, Q(R C ) ~ 
e~ a l Rc . The potential is easy to predict over trapping 
regions since RWer spends a long time there, and accu- 
mulates information about the energy landscape. 

• Conversely the largest value of R c , denoted by R^ , 
correspond to the homogeneous potential V x — —x in 
which the walker is never trapped and is quickly driven to 
+oo. R™ can be calculated from by setting f x = +1 
for all sites in (jSj). The singularity in Q when R c — » 
Rc corresponds to quasi-homogeneous potentials, where 
one force, say, on site £, is —1. Such potentials have 
exponential-in-£ small probabilities, but give values of 
R c on site y — exponentially close to Rf . On the 
overall we find 1 — Q{R^ — e) ~ e' 3 where the exponent 
is = Tln^. 

• In between Q shows marked steps at well defined and 
b- independent values of R c , which correspond to specific 
local force patterns beyond site y. A ^-pattern is defined 
as a sequence of forces on sites y+1 to y+£+l, followed by 
all + forces; the corresponding R c can be exactly calcu- 
lated from (|7l8p . and is shown for 7 among the 16 I = 4- 
patterns in Fig. [3l The histogram of R c can be accurately 
approximated for any tilt b > based on the above lo- 
cal pattern description. Given a length I we enumerate 
all the 2 l patterns, calculate the corresponding R c , and 
weight them with probability (iH)#/*=+ x (^)# f *=-. 
In practice we choose i ~ 10/ ln[2/(l — b)), to ensure that 
patterns with more than t negative forces have negligible 
weights (< e~ 10 ). The resulting histograms are in ex- 
cellent agreement with Q for intermediate values of R c 
(dashed lines in Fig. [3]). 
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FIG. 3: Cumulative probability distribution Q of R c (y, V) at 
temperature T = 1 and for three tilt values b. Full lines are 
numerical results from 10 6 samples, and dashed lines are the 
outcomes from the ^-pattern approximation. Inset: R c vs. T 
for the 3-patterns + + +, — h +, (from top to down). 



Tuning temperature for fast reconstruction. The de- 
pendence of R c upon temperature is shown for three pat- 
terns in the Inset of Fig. [3l We have R c ~ AT as T — > oo 
independently of the pattern, and R c ~ 2T/(/i + 3) when 
T — > where h is the highest barrier to the right of y in 
the potential defined by the pattern (Fig. [3]) . When the 
temperature exceeds the temperature Tf, such that a = 1 
the velocity of the RWer is finite f- ~ v(T) > 2}. The 

reconstruction rate (number of correctly predicted forces 
per unit of time) is equal to the velocity v(T) divided by 
R c , 

u {T ) = i^l±i^4 x r" dRc om 

cosh 2T — b sinh ^ Jo °c 
after averaging over the quenched potential. The depen- 



dence of v upon temperature is sketched in the Inset of 
Fig. [2j it is maximal and equal to v M for some tempera- 
ture T M realizing a trade-off between fast motion (large 
velocity) and accurate reading-out (small R c ). Even in 
the small b limit the optimal reconstruction rate is finite, 
v ~ b 2 , by working at high temperature T ~ \ , while 
in the absence of optimization procedure the number of 
predicted forces scales only as the squared logarithm of 
the time [l9j . 

Conclusion. We have shown how the number of RWs 
required for a good reconstruction of the potential can be 
deduced from the probability of survival of an absorbing 
RW process. This result is of practical interest since the 
survival probability can be estimated through numerical 
simulations e.g. in dimension > 2. Furthermore we have 
determined, for the special case of the RF model, the op- 
timal 'experimental' protocol for reconstruction (number 
of RWs, duration, temperature). 

Our formalism applies to continuously parametrized 
potentials e.g. RF model with forces taking continu- 
ous instead of binary values. The aim is now to predict 
the true potential values up to some accuracy on each 
site; this is turn determines an acceptable neighborhood 
around s* in the space of signals. The rate function lu v 
is generically parabolic around s* , with a curvature ma- 
trix called Fisher information matrix [l(|. Finding R c 
amounts to minimize this (positive) quadratic form on 
the boundary of the neighborhood, a task which can be 
carried out efficiently [l7j . Our approach can be easily 
extended to the case of a finite delay between two mea- 
sures of the positions, and Chernoff's result is recovered 
in the finite N, infinite delay limits (5, H3|. 
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